[Serve][3/N] Add application-level autoscaling snapshot #59995
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Add application-level autoscaling snapshot support for observability.
This PR extends the existing deployment-level autoscaling snapshot feature (PR #56225) to support application-level autoscaling. When an app-level autoscaling policy is configured, the controller now emits
ApplicationSnapshotlogs containing aggregated metrics across all deployments in the application.Related issues
Related to #55833
Additional information
bash % cat /tmp/ray/session_latest/logs/serve/autoscaling_snapshot_6668.log {"asctime": "2026-01-09 13:56:19,481", "levelname": "INFO", "message": "{'snapshots': [{'snapshot_type': 'application', 'timestamp_str': '2026-01-09T04:56:19Z', 'app': 'app_snap_1767934578', 'num_deployments': 2, 'total_current_replicas': 0, 'total_target_replicas': 2, 'scaling_status': 'scaling up', 'policy_name': 'ray.serve.tests.test_controller.simple_app_policy_for_test', 'errors': []}]}", "filename": "controller.py", "lineno": 511, "process": 6668, "timestamp_ns": 1767934579481838000} {"asctime": "2026-01-09 13:56:19,999", "levelname": "INFO", "message": "{'snapshots': [{'snapshot_type': 'application', 'timestamp_str': '2026-01-09T04:56:19Z', 'app': 'app_snap_1767934578', 'num_deployments': 2, 'total_current_replicas': 2, 'total_target_replicas': 2, 'scaling_status': 'stable', 'policy_name': 'ray.serve.tests.test_controller.simple_app_policy_for_test', 'errors': []}]}", "filename": "controller.py", "lineno": 511, "process": 6668, "timestamp_ns": 1767934579999085000}